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*S 5 ys Ob yl) 


& 
OS CARTOONSTOCK 


Search ID: nfkn338 


What a moose hears 
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Vocabulary of Machine Learning 


¢General Terms 

¢Broad Classes - Al, ML, Deep ML, Generative ML 
¢ML Map and Associated Vocabulary 

¢Neural Networks Map and Associated Vocabulary 


Learning vs Regular Computation 
o-— pam Classical 

qo a © 

oS io: —@O 
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“Learning” - Operational Definition 


“Learning” - Operational Definition 


66 A computer program is said to 
learn from experience E with 
respect to some class of tasks T 
and performance measure P, if 
its performance at tasks in T, as 
measured by P, improves with 


experience E: 


~ Tom Mitchell 


(on Machine Learning's Operational Definition) 


Carnegie Mellon University 


Machine Learning 
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“Learning” - Operational Definition 
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“Learning” - Operational Definition 


Leams from rail ade experience 


Performs task 


Learns more and 
further tweaks 


model 
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“Learning” - Minimal Mathematical 
Formulation 


Assuming [hoping] that there exists an unknown functional relationship 
(mapping) between two “classes” of “entities” 


i Y 
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“Learning” - Minimal Mathematical 
Formulation 


Assuming [hoping] that there exists an unknown functional relationship 
(mapping) between two “classes” of “entities” 


d[]:x 4 y 


—* 


i Y 
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“Learning” - Minimal Mathematical 
Formulation 


Assuming [hoping] that there exists an unknown functional relationship 
(mapping) between two “classes” of “entities” 


v d[]:x 4 y 


—* 


ay 


And, assuming that there is a class of candidate functions (Hypotheses), the 
goal is to choose (“learn”) the candidate function which will “satisfactorily” 
replicate this mapping. | | | 
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“Learning” - Minimal Mathematical 
Formulation 


Assuming [hoping] that there exists an unknown functional relationship 
(mapping) between two “classes” of “entities” 


ha: xX ~~ y 


And, assuming that there is a class of candidate functions (Hypotheses), the 
goal is to choose (“learn”) the candidate function which will “satisfactorily” 
replicate this mapping. 
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“Learning” - Minimal Mathematical 
Formulation 


Assuming [hoping] that there exists an unknown functional relationship 
(mapping) between two “classes” of “entities” 


i y 


And, assuming that there is a class of candidate functions (Hypotheses), the 
goal is to choose (“learn”) the candidate function which will “satisfactorily” 
replicate this mapping. | Oo 
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“Learning” - Minimal Mathematical 
Formulation 


Assuming [hoping] that there exists an unknown functional relationship 
(mapping) between two “classes” of “entities” 


{x',y"}h1 CXxY 
a y 


Samples of known mappings 
(input-output pairs) may or may 
not be available. 


And, assuming that there is a class of candidate functions (Hypotheses), the 
goal is to choose (“learn”) the candidate function which will “satisfactorily” 
replicate this mapping. | Oo 
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“Learning” - Minimal Mathematical 
Formulation 


Assuming [hoping] that there exists an unknown functional relationship 
(mapping) between two “classes” of “entities” 


{x',y"}h1 CXxY 
a y 


Samples of known mappings 
(input-output pairs) may or may 
not be available. 


We will need some 
sort of a “measure” 


to quantify 
“satisfactorily”. 


And, assuming that there is a class of candidate functions (Hypotheses), the 
goal is to choose (“learn”) the candidate function which will “satisfactorily” 
replicate this mapping. | Oo 
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“Learning” - Mathematically 


Machine Learning 


——= —— Ss SS SSS 
Final Hypothesis 
g.X—Y 


| Target Function 
Learning Algorithm 
a*f 


(Unknown) 
A 


| —iXAY 
geEH 


eect. 
-- 


Hypothesis Space 
H: {hy, hy, ..., hyb 


ro a Sample Set 
ie (Transformed Data) 
D: {(x,, Y1) (Xp, Yo); oo (Xys Yuh 27 


Optional (may 


not be 
available). 
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“Learning” - Mathematically 


UNKNOWN TARGET DISTRIBUTION 
P(y | X) 
target function f X—Y plus noise 


In a more 
general 
setting, we 


also recognize 


that 


The 

aa mapping hd 
may be 
probabilisti 


c in nature. 


We will 
need some 
“error 
measure” 
to choose 


“best” 
candidate. 


PROBABILITY 


DISTRIBUTION 


P on X 
as Xe ex ae | \ 
TRAINING EXAMPLES ; ™ x 
e (X, SV; >, eee y (Xo Mo’) ERROR | 
biG MEASURE ; 
: Bed e( ) g(X)=f (x) 


LEARNING FINAL 
kde a ees 


g: X— 28 


HYPOTHESIS SET 
H 


“Data 


Data 
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Data is what 
ML works 
with. 
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Data is what 
ML works 
with. 


Numerical 


Made of numbers 
Age, weight, number of 
children, shoe size 


Continuous Discrete 
Infinite options Finite options 
Age, weight, blood Shoe size, number of 
pressure children 
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Categorical 


Made of words 
Eye colour, gender, blood type, 
ethnicity 


Ordinal Nominal 
Data has a hierarchy Data has no hierarchy 


Pain severity, satisfaction Eye colour, dog breed, 


rating, mood blood type 


20 


“Featur 


“Label” 


“Feat 
Ca U l Some characteristics of Input Data 
” (In many ML applications the input data itself is called 


eC “features”). 


Build Model 
F(X1, X2)=Y 


\ AL Ke 
=> __. peg —_-|2_?|_~} 
Predict 


New Data Use Model 


T | | Mften another name for Output Data 
La be (True labels may or may not be available for training) 
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Zz 


rT 
Featu l Some characteristics of Input Data 


”" (In many ML applications the input data itself is called 
eC “features”). 
Features Labels 
———————_ta———__ -—*— 
"| ‘dale 
F(X1, X2)=¥ | | 3 | 1.5 | 78321 | 


3 3 98712 


atin Predict 


ES X1, X2 
( 


New Data Use Model 


\____,___J 


Column 


T | | Mften another name for Output Data 
La be (True labels may or may not be available for training) 
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P| 


“Trainin 
g Data” 


dd | | 
Tra | N | N Pairs of known Inputs (features) and their Outputs 


g Data ” (labels). 
a ey 


—, 


X 


{x',y'}, ee, & y 


Samples of known mappings 
(input-output pairs) may or may 
not be available. 


Known mappings (input-output 
pairs). 
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“Model” 
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A structure and corresponding interpretation that 


“Model” 
O e summarizes or partially summarizes a set of data, 
for description or prediction. 


i ™ 


ae aS ae 
cy Ah XL 
é oo UA \ 2s cosy " 
2) : Observational Data Mathematical Model 
te Vl aa “= {oe mJy 
/ ‘ y a a 
: ' g(x))4x n UL hq = 31415 f 
Ax ‘ j <A Oo 
bpd 


fy € Ww (+x) ; 

Wisli+-| . a 
» n — —_ 

Vr 5 k=1 A. r 
* sin(x) 5 
. —F ’ 
oA i 4 
Wy = - 


x a 
00 << 
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“Model” 


9998 
9994 
.9986 
.9976 
9962 


9945 
9925 
9903 
.9877 
9848 
.9816 
9781 
.9744 
.9703 
9659 


.9613 
.9563 
9511 
9455 
.9397 
.9336 
9272 
9205 
9135 
.9063 


.8988 
8910 
.8829 
.8746 
.8660 


.8572 
.8480 
.8387 
.8290 
.8192 


.8090 
.7986 
.7880 


7979 


re 


.7660 


1547 
7431 
7314 
193 
1071 


A structure and corresponding interpretation that 
summarizes or partially summarizes a set of data, 
for description or prediction. 
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Md M Od e yy A structure and corresponding interpretation that 
summarizes or partially summarizes a set of data, 
for description or prediction. 


.9998 

9994 

= COos([/ 
9976 

9962 

9945 8988 
9925 ‘8910 
9903 .8829 
9877 .8746 
9848 .8660 
.9816 8572 
9781 .8480 
9744 8387 
9703 .8290 
9659 8192 
9613 .8090 
9563 7986 
9511 .7880 
9455 Bs ie or 
9397 .7660 
.9336 7547 
‘9272 7431 
‘9205 7314 
9135 7193 
9063 7071 
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“Model” 


9998 
9994 
.9986 
.9976 
9962 


9945 
9925 
9903 
.9877 
9848 
.9816 
9781 
.9744 
.9703 
9659 


.9613 
9563 
9511 
9455 
.9397 
.9336 
9272 
9205 
9135 
.9063 


A structure and corresponding interpretation that 
summarizes or partially summarizes a set of data, 


for description or prediction. 


cos (/7) 


.8988 
8910 
8829 
.8746 
.8660 


.8572 
.8480 
.8387 
.8290 
.8192 


.8090 
.7986 
.7880 


ea 


hea 
.7660 


1547 
7431 
7314 
193 
1071 


ee oe ee ee. a ee 
a 4! 6! 8! 
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“Model” 


9998 
9994 
.9986 
.9976 
9962 


9945 
9925 
9903 
.9877 
9848 


.9816 
9781 
9744 
9703 
9659 


.9613 
9563 
9511 
9455 
.9397 


.9336 
ete 
9205 
9135 
.9063 


A structure and corresponding interpretation that 
summarizes or partially summarizes a set of data, 


for description or prediction. 


cos ([7) 


8988 
.8910 
8829 
8746 
.8660 


8572 
.8480 
8387 
.8290 
8192 


.8090 
-7986 
.7880 
7771 
.7660 


7547 
7431 
7314 
£193 
7071 


2 4 6 8 
eae ee ee ee 


2" 4!” 6! gi 
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“Model” 


9998 
9994 
.9986 
.9976 
9962 


9945 
9925 
9903 
.9877 
.9848 


.9816 
9781 
9744 
9703 
9659 


.9613 
9563 
9511 
9455 
.9397 


.9336 
9272 
9205 
9135 
.9063 


A structure and corresponding interpretation that 
summarizes or partially summarizes a set of data, 


for description or prediction. 


cos ([7) 


8988 
.8910 
8829 
8746 
.8660 


8572 
.8480 
8387 
.8290 
8192 


.8090 
7986 
.7880 
7771 
.7660 


7547 
7431 
7314 
£193 
7071 
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“Algorith 
mM” 
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a5 


“Algorith 
mM” 
—— 
Anjalgorithm is a set of 


step-by-step instructions 


that describe how to wo. 
perform a task. 
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“Algorith 
mM” 


__O 

eh 
Anjalgorithm is a set of 
step-by-step instructions 
that describe how to 
perform a task. 


Suppose you have to learn 
and iteratively with 
following rules 


- We initialize 

- After each iteration, either 
Or can go up or down by 
1, but not both. 

- Update criteria is to 
minimize , where = 
desired value, and = 
obtained value. 
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Types Of Algorithms 


, Dynamic Divide And 
Brute Force Recursive p C 
Algorithm Algorithm ee — 
Algorithm Algorithm 


a % 


a s 


Greedy 


Backtracking 
Algorithm 


Algorithm 


Randomized 
Algorithm 
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“Brute Force . | 
: ystematically checks all 
Algorith mM” possible candidates against a 


given criteria (“exhaustive 
search”). 


“Brute Force 


Systematically checks all 


Al Q O rith mM” possible candidates against a 


given criteria (“exhaustive 
search”). 


Homework: what 
could be the 
drawbacks [benefits]? 


Strategy: try all possible paths to find 
those with maximum sum. 
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e re eC d y Systematically makes the locally 


optimal choice at each step 


Al Q O rith aa sd iia ial to long-term 


“ G reed Y Systematically makes the locally 
optimal choice at each step 


Al Q O rith aa ed ert toca to long-term 


Homework: what 
could be the 


fs drawbacks [benefits]? 


Strategy: Make the locally best choice 
based on what is directly in front. 


| ] Mathamaticc for Machina | aarningn /N Nayvanrn DPD RDittDm CIV coc 
) Matnematics tor Macnine Learning / Dr. I aveed R. Butt @ GIk lve ES 


“ G ree d Y Systematically makes the locally 


optimal choice at each step 


Al Q O rith M fe (without regard to long-term 


optimality). 


Initial Weight (wou) 


Gradient Descent, 
used heavily in 
Neural Networks, is 


sous Learning rate (a) 


New Weight (wy.,) 


a greedy algorithm. 


Wnew ~ Wold = Fer 


Weight (W) 


Minimum point of cost function 


Strategy: Make the locally best choice 
based on what is directly in front. 


| = 5 iS _ 
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“r,t Recursively breaks down a problem 
Divi de a nd into two or more subproblems of the 


Conq Uuer Algorith Mm ’’ same or related type, until these 


become simple enough to be solved 
directly. 


“r,,: Recursively breaks down a problem 
Divide a nd into two or more subproblems of the 


Conq uer Algorith Mm ’’ same or related type, until these 


become simple enough to 

directly P g omework: what 
could be the 
AravrthAaachlbe 'henefits |? 


Conquer @¢— 
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rT ; 
Dyn a mM | C Repeatedly breaks down a problem 
into overlapping subproblems of 


Prog Ke! mM mM | N Q " similar type to save computations. 


rT ; 
Dyn a mM | C Repeatedly breaks down a problem 


into overlapping subproblems of 


= ” 
Prog Ke! mM mM | N Q similar type to save comput 
Homework: what 
could be the 


Dynamic Programming drawbacks [benefits]? 


Paradigm 


Overlapping 
Subproblems 


Divide 


and 
Conquer 


Optimal 
Substructure 


Strategy: Make a decision at each step considering 
the current problem and solution to the previously 
solved problem to calculate the optimal solution. 


fereny 
a 2 ’ ss | Brute Force 
f rae: . mm Algorithm would 
Oo. Oe my : 3@ . ; @ calculate all the 


+ rian Psa paths from A-J and 
‘ i P _—@ choose the 
3 


shortest. 
Ss 1 2 


AMin. 
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Xs 7 
/ ea, # 4 
2 Set y 7 | Brute Force 
Z a ty mm Algorithm would 
OOOO calculate athe 


choose the 
Shortest. 
fiim.. i 2 Greedy 
nesiveteesedy Algorithm would 
Approach Ox: Se a tasesel @ choose the locally 
” Y we 4. 


7 . 
, ‘S 7 
4 
yi ce be - 
\ / - 
. ee 7” 
‘ Ae ‘ ae 
‘ 3s , rs s ° 
. ea! fe eS 4 
b' ‘ . ta ‘ t . . - — 
\ , <4 . / Soe ’ 
. : 
3 x, / aby 3 
s es sy ca 
\ : 
. - 
5 we 
‘ -- 
. sae 
i i 
. 
3 


best option and 


©. may miss the 
gf “3 


global best. 


Is there a 
pate sence better way? 
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E-H-J is shorter than E-I-J, irrespective of whether we come to E 
“4 11 

ART “att Oe GOAL - 

cae ©... ould we save 

‘es 4 computations by 
breaking problem 
into overlapping 
subproblems? 


= 
~ 


ey 


= 
2 . 
rs . 
- 1 Se 

H . 

' 

1 

ee ee 

wa ' ~=— 
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E-H-J is shorter than E-I-J, irrespective of whether we come to E 
from_B, C;-er D!! 


START 


ey 


fH 
joe, econ Someta 

a ' Te 
C 

mie ai 

; a 

fp Ph w 

phe N 

1 1 

' ae 

1 Seok 

H Bs 

H 

1 

' 

H 

1 

1 

H 
} 


vs 
‘ . 
Ns 7 cA gf 
’ .-- q ; 
zi bY 
Pd \ 
N 
/ * i 
; No . 
r, v y. 
J ”~ ON 
ie EOS 
, Ze Ae 
f 3 
, , 
’ ae N 
/ , 
_---i---- i ee --- Jn mnnpyenmaacas 
4 N 
‘ . 
‘ 4 
. 
. 


uter science 
ul AM. 


1 \ 
U a ies 
’ 4 
M7 S a 
‘ é 
/ PSS 7 
F . ye 
/ F 
ye ‘ 
os 


: . awe 
ee = 
“ 


ign ial 


- 
~ 


4 
. 
op 
‘ 
\ 


GOAL 


Could we save 
computations by 
breaking problem 
into overlapping 


subproblems? 
Checked 
unnecessaril 
y multiple 
times! 


Why not check the 
minimum for E-J, 
F-J, and G-J and 
use that instead? 
Why not repeat 
this abstraction 
even on second 
layer? 
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Overlapping 
su bproblems! 


AAN 


uter science 


WAR... 


Strategy: Make a decision at each step 
considering the current problem and solution to 
the previously solved problem to calculate the 
optimal solution. 
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By Any Other 
Name... 


INPUT TERMS 


— TERMS 


e Responses 


Dependent 
variables 
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Relationship: Al, ML, and Deep 
Learning 


@ Artificial Intelligence 


Development of smart systems and machines that can carry 
out tasks that typically require human intelligence 


© Deep Learning 


Uses an artificial neural 
network to reach accurate 
conclusions without human 

intervention 


COMPUTER SOCIETY 


Sz 


What Else Does AI Cover (Other 


Artificial 
Intelligence 


Than ML)? 


Recurrent 
Convolutional 
Modular 
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What About Generative Al? 


Artificial Intelligence 
The theory and methods to build machines 
Expert System Al that think and act like humans. 
Programmers teach Al 
exactly how to solve specific 


problems by providing 
precise instructions and Machine Lea rning 
steps. The ability for computers to learn from 


experience or data without human programming. 


Generative Al 
Generates new text, 
audio, images, video or 
code based on content it 
has been pre-trained on. 


Pn o> 


tibia abascThn aiforeducation.io 
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Radial Basis Function 
Neural Networks (RBFNN)} 


Generative Adversarial 
Networks (GANSs} 


pai cs Modular Neural seqeseq 
Recurrent Neural Networks 
Networks (RNN) 


Multi Layer Perceptron 


DCNN 
Convolutional 


(MLP) —. Neural Networks 
Artificial Neural (CNN) 
Networks 


Q-Learning Deep Q-Network 
{DON} 


Random Forest 


Th - Machine Ensemble 
ML Ma e Learning Learning Learning 


A3C Genetic Algorithm  SARSA 


KNN Logistic = Naive 
Regression Bayes 


AdaBoost Boosting XGBoost 


GradientBoost CatBoost LightGBM 


Classical Eucat priori FP-Growth 


3 
Learning 


Decision SVM 
Tree 


Dimensionality 


Clustering Reduction and 


Visualization 
Linear Lasso and Ridge Fuzzy C-Means_ k Means BCA cok we 
Regression Regression DRSCAN Mear:Shift 
Polynomial tSNE QDA_ LSA 


Regression 


LLE 


Do 


Main Types of Machine \ Systems 


Enough data i a When classical 


Defined features ML is not enough 


; Unclear features 
But there’s an 


No data | Complicated data 
Belief in a miracle 


environment to 


Classical interact with Ensemble 
ML Learning 


Reinforcement Artificial 
Learning Neural Nets 
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Classical Machine 
Learning 

aka 

Statistical 
Inference 


Radial Basis Function Generative Adversarial 
Neural Networks (RBFNN}) Networks (GANSs} 
a ii OM Modular Neural ie 
Recurrent Neural Networks 
Networks (RNN) 


DCNN 
Multi Layer Perceptron Convolutional 


(MLP) — Neural Networks 
Artificial Neural (CNN) 
Networks 


Random Forest 
Q-Learning Deep Q-Network 
DON 


ne Stacking 


Reinforcement Machine Ensemble 
Learning Learnino Learning 


A3C Genetic Algorithm —SARSA AdaBoost | Boosting] xGBoost 


GradientBoost CatBoost LightGBM 


KNN Logistic Naive 
Regression Bayes 


Classical Eucat Apriori FP-Growth 
Learning 
Decision SVM 


Tree 
Dimensionality 


Clustering Reduction and 


Visualization 


Linear Lasso and Ridge Fuzzy CMeans_k Means 
Regression Regression DBSCAN Mean-Shift PCA LDA SVD 
Polynomial tSNE QDA_ LSA 


Regression LLE 


5] 


Regressi 


Kind of Another Name _ for 
O N “Curve Fitting” 
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Regressi 


Kind of Another Name _ for 
“Curve Fitting” 
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Se) 


Regressi 


O N “Curve Fitting” 


Kind of Another 


Obs _ hours” score 


1 1 64 
2 2. 66 Simple Linear Regression Nonlinear 
3 4-76 j 
4 3 73 
5 5 74 
6 6 at 
7 6 83 
8 iff 82 
9 8 80 
10 10 88 
11 11 84 
12 3 82 


13 12 91 
14 12 93 
15 14 89 


Q. Why do we 
do it? 
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Regressl 
Kind of another name _ for Broader 


O N “Curve Fitting” Category: 
Supervised 
Learning 


C2 TERMS — TERMS 


ad a i | 
—, — Traini 
Independent Dependent 
variables variables n g vy 


Data 


a 
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Regressi 
Kind of another name _ for 


O Nn “Curve Fitting” 


Multiple Linear Regression 


|_| study Hours Prep Exams [Final Exam Score. 
students | 3 | 2 | 5 
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Multiple 
independent 
variables. One 
dependent 
variable. 


62 


Reg ressi Sometimes we fit a probability curve toa 


“categorical” dependent variable, rather than a 
O N trulv numeric one. 


Hours 
(a 0:50: )'6.75 1) 1.00) 1.25) 1:50) 1.75) 1.75) 2:00)| 225 2-50 | 2.75 3.00) 3.275 | 3.50) 4.00 | 4.25 | 4:50)|:4.75 | 5.00) 5:56 
Xk. 
Pass 

0 0 0 0 1 0 1 0 1 0 1 0 1 1 1 1 1 1 
(Yi) 
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Reg ressi Sometimes we fit a probability curve toa 


“categorical” dependent variable, rather than a 
O N trulv numeric one. 


Hours 
es 0.50 | 0.75 | 1.00 | 1.25 | 1.50 | 1.75 | 1.75 | 2.00 | 2.25 | 2.50 | 2.75 | 3.00 | 3.25 | 3.50 | 4.00 | 4.25 | 4.50 | 4.75 | 5.00 | 5.50 
k 


Pass 
0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 1 1 1 1 1 
(Yk) 


Probability of passing exam versus hours of studying 


1.00 - e e e e e e e e e e 


Probability of passing exam 
io) i) 
3 a 


° 
) 
oO 


0.00 - e e e e e e e e e e 


Hours studying 


Classificat 


Kind of “finding delimiting 


O N curve[s]” 
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Kind of “finding delimiting Broader 
O N curve[s]” Category: 


Supervised 
Learning 


Linear 
boundary 


- Training: Labelled data is provided to the system, and it 
finds class boundaries (“delimiting curve[s]”). 
Identification: New data is given to system and it labels it 
(1.e., Identifies which class it belongs to). 
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Kind of “finding delimiting Broader 
curve[s]” Category: 


Supervised 
Learning 


~———___ Regression 
: Curve 


ee Classification 


a Curve 
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Classificat 


Kind of “finding delimiting Logistic 

O N curvels]™ (Categorical) 
Regression 
Curve 


Probability of passing exam versus hours of studying 


1.00 - e e e e e «© eee e 


oo OU > Kom =———> Pass Or Fail 


Student Profile 


) ° 
oi “N 
oO ol 
1 ' 


Probability of passing exam 


i] 

iy 

a 
' 


Student Profile 


3 
Hours studying 


Classification 
Curve 
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Classificat 
on 


We are not limited to linear 
boundaries! 


Classificat a 
We are not limited to linear 
O N boundaries! 


Linearly separable Not linearly separable 
A linear decision boundary that No linear decision boundary that separates 
separates the two classes exists Nonlinear the two classes perfectly exists 
boundary 
Linear oa 


boundary 


ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 


Classificat 
on 


Neither are we limited to two 
dimensions! 


Classificat | 
Neither are we limited to two 
O N dimensions! 


kernel 
function 


input space feature space 


In fact, a nonlinearly separable 
problem in lower dimensions, 
could be linearly solvable in 
higher dimensions! 
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Zz 


Classificat 
on 


We are also not limited to just two 
classes! 


Classificati 
We are also not limited to just two 
On classes! 


Binary classification Multi-class classification 


Classification 


———————> Plane 


Machine 
Leaming Model 


& Zoumana KEITA 
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Classificat 
on 


Multi-Label 
Classification 


We could even identify 
multiple labels within one 
input! 


Classificat 
on 


Predictions 


Multi-Label 
Classification 


Boat 
Cat 
Dog 
Plane 
Machine 
Leaming Model 


oe Zoumana KEITA 


We could even identify 
multiple labels within one 
input! 
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Classificat 
on 


One often has to decide what 
kind of losses (misclassifications) 
to accept. 


Finally, keep in mind that classification Is 
not generally as simple as this! 
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Broader Category: 


C U ste a 1) Cd 


training!” 


Unsupervised 
Learning 
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CI U ste a are | Ccrovping without 


training!” 


Unsupervised 
Learning 


How many 


characters do 
you see? 


~ 

diff t 

anouaae 3T QV QJ 
® a 


Even though 74 | rg 


you probably do 
not know any of 
these 
languages, you 
can attempt to 
answer that 


question. 
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Broader Category: 


79 


Clusterinc 


How many 
different 
language 
characters do 
you see? 


Even though 
you probably do 
not know any of 
these 
languages, you 
can attempt to 
answer that 
question. 


“Grouping 
training!” 


without 


Broader Category: 


Unsupervised 
Learning 


Without having been 
trained on the scripts, you 
can probably make out 
similarities and 
differences, and use these 
for potential grouping 
(“clustering”) 
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Broader Category: 


C U ste a 1) Cll 


training!” Unsupervised 


Learning 


Classification Clustering 


Supervised learning Unsupervised learning 
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CI U ste a are | Ccrovping without 


training!” 


Unsupervised 
Learning 


Output 


Algorithm 


Input Raw Data 


e.g., automatically group customers into 
different segments based on their purchasing 
behavior. 
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Broader Category: 


82 


AS S O C a C Broader Category: 
N Links” Unsupervised 


Learning 


"93% of people who purchased item A 
also purchased item B" 
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AS S O C a C Broader Category: 
N Links” Unsupervised 


— 


Learning 


"93% of people who purchased item A 
also purchased item B" 


ES691 - Mathematics for Machine Learning / Dr. Naveed R. Butt @ GIKI - FES 


Pattern 
Re C O e N ti O “Discovering Broader Category: 


Regularities in Data” Unsuper vised 
a Learning 
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Pattern 


Regularities in Data” 


Unsupervised 
N Learning 


0112358 13 2134... 
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Broader Category: 


86 


Pattern 


he “Discovering : 
Recognitio insupervse 


N Learning 


Broader Category: 


Numbers 


1235813 2134... 
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Pattern 


he “Discovering : 
Recognitio insupervse 


N Learning 


Numbers i . 
13 + 21 = 34 
13 = 21 
| IMAL cae 
06419446010 
\.-—~ 
hamdanching At 


Broader Category: 
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Dimensiona 
“Removing data 


lity attributes that are of 
least or no importance 


Re d U ctl O N to task at hand” 


Broader Category: 
Unsupervised 


Learning 


Data may 
contain 
unnecessary 
details! 
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Dimensiona 
“Removing data Broader Category: 


lity attributes that are of Unsupervised 


least or no importance Learning 


Re d U ctl O N to task at hand” 


LinkedIn: |'m honored and thrilled to announce that | have 
been selected among the top 5 applicants who 


participated in the professional and most respected Data may 
exam which evaluates the skill and ability to operate contain 
fuel-based vehicles. | cannot wait to see what the rena ry 


next chapter holds, and | cannot express my 
appreciation to the ministry of transportation, 
Google, NASA, and My neighbors who supported 
me during this challenging Journey. 


Reality: | got my Driving License 
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Dimensiona 
lity 
Reduction 


A sparse 
representation of 
data may be 
possible! 


“Removing data 
attributes that are of 
least or no importance 
to task at hand” 


Broader Category: 


Unsupervised 


Learning 
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ol 


Amplitud 


Dimensiona 
lity 
Reduction 


Tim 
e 


A sparse 
representation of 
data may be 
possible! 


“Removing data Broader Category: 
attributes that are of Unsupervised 


least or no importance Learning 
to task at hand” 


Fourier 


5 1 1 
Frequéncy? 
(Hz) 
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Dimensiona 
“Removing data 


lity attributes that are of 
least or no importance 


Re d U ctl O N to task at hand” 


Broader Category: 


Unsupervised 


Learning 


May be possible 
to reduce 
dimensions! 
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Dimensiona 


- t “Removing data Broader Category: 
| y attributes that are of 


Unsupervised 


. least or no importance Learning 
Reduction 


to task at hand” 


Qe 


a] & ca 
vy} @ ee" eee 
. mee eee 
M ib! 
ae Most variation in data ts in this direction ay be Poss b e 
we 


to reduce 
dimensions! 


Variable 2 


Dimensionality 
Reduction 


Variable 1 


After readjusting our axes, data 


mainly changes along one 
dimension. 
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Dimensiona 


lity 
Reduction 


May be possible 
to merge 
attributes! 


“Removing data 
attributes that are of 


least or no importance 
to task at hand” 


Broader Category: 


Unsupervised 
Learning 
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io 


Dimensiona 


lity 
Reduction 


May be possible 
to merge 
attributes! 


| Customer Customer Customer 
Age Country 


oa 
emenwo fees 
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Twix 


“Removing data 
attributes that are of 


least or no importance 
to task at hand” 


Milky Way Skittles Dum Dums Blow Pops 


Broader Category: 
Unsupervised 
Learning 


Chocolat Lollipop 
es S 


FES 
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Dimensiona 


lity 


“Removing 
attributes that are of 


Unsupervised 


least or no importance Learning 


Re d U ctl O N to task at hand” 


Why would we 
want to reduce 
dimensions? 
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Dimensiona 


; t “Removing data Broader Category: 
| y attributes that are of Unsupervised 


least or no importance Learning 


Re d U ctl O N to task at hand” 


Why would we 
want to reduce 
dimensions? 


Curse of 
Dimensionality! 


cs 


10 data points with 10 
celeron each give 100 


oe 
10 ee Soints with 5 


attributes each give 50 


— Phiifion data points with 10 
attributes each give 10 million 


attributes 
Million data points with 5 


— attributes each give 5 million 


Difference of 5 attributes. 


million! 
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That 
completes... 


Classical Machine 
Learning 

aka 

Statistical 
Inference 


Radial Basis Function Generative Adversarial 
Neural Networks (RBFNN}) Networks (GANSs} 
a ii OM Modular Neural ie 
Recurrent Neural Networks 
Networks (RNN) 


DCNN 


Multi Layer Perceptron Convolutional 
(MLP) — Neural Networks 
Artificial Neural (CNN) 
Networks 
Random Forest 


Reinforcement Machine Ensemble 
Learning Learnino Learning 
AdaBoost XGBoost 


GradientBoost CatBoost LightGBM 


Q-Learning Deep Q-Network 
{DON} 


A3C Genetic Algorithm  SARSA 


KNN Logistic Naive 


Regression Bayes E 
—— Classical Eucat Apriori FP-Growth 
Classification . 
Learni Ng Pattern Search 
— SVM 


Dimensionality 


Clustering Reduction and 


Visualization 


Linear Lasso and Ridge Fuzzy CMeans_k Means 
Regression Regression DBSCAN Mean-Shift PCA LDA SVD 
Polynomial tSNE QDA_ LSA 


Regression LLE 
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Questions?? Thoughts? ? 


5 


e 


”-~ — 
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